Application of Stacked Generalization to a Protein Localization Prediction Task
نویسندگان
چکیده
A stacked generalization data mining approach was applied to a simplified version of the KDD Cup 2001 protein localization task. Four level-0 models were developed: an Artificial Neural Network, a Decision Tree, a Nearest Neighbor Classifier, and a Hybrid model that trained an Artificial Neural Network using those inputs selected as important by the Decision Tree. These models were developed from both a random sampling of the training dataset and a sampling designed to ensure an equal distribution of the localization variable. The predictions of these level-0 models were used to develop three level-1 generalizers: an Artificial Neural Network, a Decision Tree, and a Naïve Bayesian Classifier. The accuracy rates of the various models and generalizers suggest a modestly improved performance when using stacked generalization. Among the level-0 models, the Nearest Neighbor and the Hybrid performed slightly better than the Artificial Neural Network and Decision Tree. The performance of the three level-1 generalizers was roughly equivalent. Overall, both level-0 models and level-1 generalizers achieved better accuracy when equal distribution sampling was not performed.
منابع مشابه
An Ensemble Classification Model for the Diagnosis of Breast Cancer Using Stacked Generalization
Introduction: Breast cancer is one of the most common types of cancer whose incidence has increased dramatically in recent years. In order to diagnose this disease, many parameters must be taken into consideration and mistakes are possible due to human errors or environmental factors. For this reason, in recent decades, Artificial Intelligence has been used by medical practitioners to diagnose ...
متن کاملPrediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks
Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...
متن کاملAn Ensemble Classification Model for the Diagnosis of Breast Cancer Using Stacked Generalization
Introduction: Breast cancer is one of the most common types of cancer whose incidence has increased dramatically in recent years. In order to diagnose this disease, many parameters must be taken into consideration and mistakes are possible due to human errors or environmental factors. For this reason, in recent decades, Artificial Intelligence has been used by medical practitioners to diagnose ...
متن کاملMonitoring of Regional Low-Flow Frequency Using Artificial Neural Networks
Ecosystem of arid and semiarid regions of the world, much of the country lies in the sensitive and fragile environment Canvases are that factors in the extinction and destruction are easily destroyed in this paper, artificial neural networks (ANNs) are introduced to obtain improved regional low-flow estimates at ungauged sites. A multilayer perceptron (MLP) network is used to identify the funct...
متن کاملProtein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کامل